ROMANIAN ACADEMY INSTITUTE OF CHEMISTRY TIMISOARA AVRAM I. SORIN The Study of the Biological Activity of Flavonoids by means of Computational Chemistry Methods

نویسندگان

  • Ludovic Kurunczi
  • MIRCEA DIUDEA
  • MIHAI MEDELEANU
چکیده

Flavonoids represent a highly explored chemical class of natural compounds, as we underline by means of a concise bibliometric exercise, which opens the thesis. The study, conducted with data retrieved from open access PubMed library (in May 2011) indicates fifty thousand papers indexed for flavonoids, from which thirty four thousand consider flavonoids as a main topic, according to MeSh (Medical Subject Heading). The analysis of the number of papers (and reviews) published in the last two decades indicate an increasing tendency of publications, reaching in 2010 over two thousand five hundred entries. The main journals are directed towards areas as nutrition, medicine and phyto-chemistry. The introduction of the thesis consists of a systematic description of the four main flavonoid classes (based on the chemical skeletons), i.e., common flavonoids, isoflavonoids, neoflavonoids and minor flavonoids. Their biochemical properties show biologically reactive compounds. Section I.1.4 Virtual Collections, describes known flavonoids databases from which we highlight Kinoshita’s 1 attempt to include biological activities. A series of deficiencies (e.g., only two chemical skeletons, reduced number of biological activities/target, very limited applicability) and also its lack of accessibility provides a strong reason for a new attempt to build a collection of flavonoids. Technological developments along the past two decades allow key processes in the pharmaceutical industry to be conducted more efficiently. Thus, emerging areas such as combinatorial chemistry (the synthesis of a large number of compounds in a short period of time) and high-throughput screenings (HTS; biological activity determination for a large number of compounds in a short period of time) led to vast chemical libraries and biological data, consequently justifying the need of new computational methods capable to handle large number of compounds, defining the field of chem(o)informatics 2 . The second chapter I.2 Elements of Chem(o)informatics provides a concise description of the PubChem (pubchem.ncbi.nlm.nih.gov/) database and section I.2.1 PubChem (Bioassay) focuses on HTS experiments and specific errors. Chem(o)informatics and more specific virtual screening (VS) methods applied in the second part of the thesis are introduced: 1 Kinoshita T, Lepp Z, Kawai Y, Terao J, Chuman H. 2006. An integrated database of flavonoids. Biofactors Oxford England 26:179–188 2 An Introduction to Chemoinformatics (Revised Edition) Andrew R. Leach și Valerie J. Gillet Springer, Dordrecht, Olanda, 2007; ISBN 978-1-4020-6290-2 similarity search, docking, pharmacophore search and evaluation parameters (in virtual screening). Virtual screening is considered the most important area of chem(o)informatics and aims to retrieve a small set of compounds with desired properties (usually compounds with high probability of being bioactivity on a specific target) form large chemical libraries. The second part of the thesis describes the own findings and spreads over seven chapters. The first one refers to the Collection of Flavonoid-Related Compounds with Enhanced Biological Selectivity). We describe the designed workflow that facilitated the retrieval of the flavonoids. Starting from the main skeletons of the four classes of flavonoids, we retained from the PubChem database superstructures with biological activities measured in qHTS experiments and limited by some essential conditions, e.g., the most important being the presence of a detergent in the assay, which prevents small molecule aggregation in the biochemical assays. In the attempt to avoid a series of errors specific to this type of tests, a set of successive filters was applied. Thereby, we identified and excluded frequent hitters, aggregators (retrieved compounds with hill factor 0.5 – 2), known autofluorescent compounds and luciferase inhibitors. A number of 3,412 compounds was retained, from which 529 were active in at least one assay. Out of the 40 assays remained, 21 were biochemical (ProtD) and 19 cellular (Cell). From these data we selected 12 sub-sets of flavonoid-related compounds (FRC) representing the CFRCEBS collection, a platform purposed to train prediction methods, to perform validation studies for virtual screening methods and structure-activity relationship (SAR) analysis in molecular modeling. The random forests algorithm (an ensemble method comprising a large number of decision trees) was applied to classify the 529 active compounds and the 254 FHs. The models showed around 10% prediction errors. The analysis of the most important variables employed to train the random forests underline the importance of the BCUT 3 descriptors reflecting molecular weight and charge. In the future we intend to continue this analysis using also other classification methods and molecular descriptors in order to improve the prediction accuracy and understand the physico-chemical or topological molecular characteristics responsible for the high reactivity of some flavonoid-related compounds (FRCs). 3 Burden –CASUniversity of TexasEigenvalues The same chapter describes ProtD type CFRCEBS sets employed along the next chapters in order to validate methods developed in this thesis. A graphical representation of the chemical space described by the active and inactive FRCs confirms that the CFRCEBS are suited for evaluation studies. Furthermore, we reason the selection of crystal structures for docking experiments for five CFRCEBS subsets. Evaluation of VS methods became of increasing interest in the past years mainly because of poor standards in the field. We derived two new evaluation parameters, based on the ROC (receiver operating curve), to measure the early enrichment capabilities of VS methods: AROCE (Addition of ROC Enrichments) which weights actives by means of the false positive rate (FPR), gradually, in false positive (FP) relevance intervals, and eROCE (exponential ROC Enrichment) which weights every active by means of an exponential function applied to the inverse of the FPR (false positive rate). Thereby, the actives are ordered decreasingly, from the top of the ranking list. Chapter II.2, AROCE and eROCE – new evaluation parameters in VS, comprises also a comparative study, using sets of compounds with active to inactive ratios ranging from 1:10 to 1:1000. Several performance metrics were compared: TP% (percentage of true positives) at 1%, 5% and 10% FPR (false positive ratio), AROCE, eROCE, BEDROC 4 and AUC (the aria under the ROC curve). We demonstrated that because of strict cut-offs, the TP% at x% FPR can by destabilized by large variations of actives across the top percent of the ranking list, in the successive evaluations (mostly at 1% FPR). In the case of eROCE and BEDROC, we varied the value of the α parameter thereby focusing more on actives found until the 1%, 5%, 8% and 10% FPR/number of active compounds. The results of this analysis confirm the limited usability of BEDROC (i.e., applicable only in scenarios with a much larger number of actives compared the inactives), in contrast to AROCE, eROCE and AUC which showed superior robustness even in situations with active:inactive ratio of 1:10. This property would reveal itself to be beneficial for the evaluations conducted in the following studies. The next chapter, i.e., II.3 Retrospective optimization in 2D similarity search, describes a complex evaluation of 2D similarity search. This simple method, consisting of measuring the similarity between two molecules based on a common molecular representation, facilitates fast, efficient searches in large databases. Similarity searches in chemical libraries are conducted using a single reference structure, or more efficiently (if 4 Boltzmann-Enhanced Discrimination of ROC known) multiple references, a procedure known as group fusion. The similarity results obtained with several reference structures are consequently aggregated by means of a fusion rule, increasing the efficiency of the method. The study performed in this chapter was conducted on a set of 3,478 inhibitors of aldehyde dehydrogenase 1A1 (ALDH1A1) and 43,938 compounds inactive on the enzyme (data extracted from AID1030 – PubChem Bioassay), and subsequently on the AID1030 subset referenced in the CFRCEBS (comprising 111 active and 989 inactive FRC). We evaluated the results obtain by varying 16 similarity coefficients, 5 molecular descriptors (binary fingerprints: MACCS and PubChemFP; realvalued: Autocorrelation, BCUT, and the Lipinski’s ’rule of five’ descriptors) and two fusion rules (maxSim and sumSim) in the context of two reference set sizes (the second being ten times larger than the first one). The analysis of the 278,240 similarity searches was performed by means of eROCE (α = 20), TP% at 1%, 5% and 10% FPR, and AUC. In order to analyze the relative performance of the large number of similarity coefficient-molecular descriptor combinations the significantly different (p <0.01) mean eROCE values (over 30 iterations) were represented in heatmaps. The results confirm the qualities of the Tanimoto coefficient applied to molecular fingerprint and maxSim fusion rule, and highlighted the importance of the Fossum coefficient, which in addition to Tanimoto, offered significantly higher early enrichment and overall discriminative power for non-binary descriptors and the sumSim rule (the eROCE enrichment obtained with the larger reference set was equaled by the tenfold smaller set). The increase of reference structures seamed to influence in a smaller amount the sumSim fusion results and/or non-binary searches (compared to the maxSim rule and binary representations). These observations were verified also by the CFRCEBS AID1030 set. In the case of molecular fingerprints and maxSim fusion, the two similarity coefficients indicate comparable results. For sumSim fusion and/or non-binary searches, the Tanimoto coefficient showed eROCE and AUC values superior to Fossum. The promising results indicated by Fossum and the Autocorrelation (and BCUT) descriptors on the larger, more chemically diverse AID1030 set (compared to the smaller AID1030 CFRCEBS set) encourages us to extend the evaluation of other biological targets in order to assess the extent of these observations. Discussions concerning the upper limits 2D similarity search methods on the larger AID1030 set, highlighted the relative low performance of these methods in chemically diverse databases, e.g., the early recognition in the CFRCEBS was found to be four times larger compared to the extended AID1030 set). The next two chapters comprise protein-ligand docking applications in virtual screening, modeling the ligand interactions (in the binding site of the receptor) and ligand binding affinity prediction. Chapter II.4 PLSDA-DOCET – a new ensemble method for consensus scoring functions, describes a method to construct a target specific scoring function using a selection of energetic terms from different functions. Hence, the protocol was denominated Partial Least Squares-Discriminant Analysis Docking Optimized Combined Energetic Terms because of using in the first step of the protocol, the supervised PartialLeast-Square Discriminant-Analysis learning algorithm, for the selection, in of a subset of energetic terms (cET) essential to discriminate between active and inactive compounds in the available dataset. From the energetic terms we sum those which indicate the best early enrichment and overall discriminative power as indicated by AROCE, AUC and AROCE50 (the mean between AROCE and AUC). We conducted an evaluation study on five protein targets with datasets comprised in the DUD (Directory of Useful Decoys), in which the PLSDA-DOCET performance was compared with simple docking (using one scoring function) and with the combination of several scoring functions (both strategies considering the evaluation parameters mentioned above). Additionally, we included for comparison 2D (Extended Connectivity FingerPrint-4 and Functional Connectivity FingerPrint-4, and the Tanimoto similarity coefficient) and 3D (ROCS – Rapid Overlay of Chemical Structures) similarity search. The evaluation of this method was performed according to both criteria essential to VS applications: early enrichment and scaffold-hopping. The performances of the docking methods optimised on the DUD sets were validated by means of external sets referring to the same biological targets. These sets were constructed with bioactivity data drown from the ChEMBL database (https://www.ebi.ac.uk/chembl/), according to the protocol used to ensemble the DUD sets. The external validation results suggest only small mean differences between the two sets in the case of the cET optimized according to AUC and AROCE50. The DUD evaluation results highlighted the dependence of ligand-based methods (most evident in the case of ROCS) on the biological target. Regarding the VS performance criteria the PLSDA-DOCET protocol with AUC optimized cET indicated superior performance compared with the other methods. In order to explore the fusion potential of the outputs of the methods explored here, we measured the rank correlations of the chemotypes as ordered by the methods. We observed a different chemotype affinity of ligand-based methods compared to structure-based methods. Furthermore, we obtained cvasi-complementar chemotype orderings between the PLSDADOCET and the other docking strategies employed here. We conclude that data fusion of docking strategies and similarity searches could contribute to a significantly increase of diverse chemotype prioritizations in VS applications. The evaluation of the PLSDA-DOCET protocol was subsequently extended to five CFRCEBS sets for which we identified X-ray structures. Additionally, the optimization parameters used in the DUD studies were enriched with eROCE (α=20). The results obtained here were, in terms of mean early enrichment values, at least two-fold poorer compared to the previously described DUD results. However, the eROCE, AUC and AROCE50 optimized cET performances indicate evident superior active recognition especially at 5% and 10% FPR and AUC. Altogether, these results recommend the PLSDA-DOCET protocol as an viable and efficient method in VS. Along the comparative study of the PLSDA-DOCET, we highlight the poor docking results obtained for aldose reductase (ALR2), a protein for which the binding site exerts a remarkably high degree o conformational flexibility. Thus, in the next chapter, i.e., II.5 Challenges in docking 2’-hydroxy and 2’,4’-dihydroxichalcones into the ALR2 binding site, we investigated, in detail, the challenges rising from docking a series of thirty-eight 2’hydroxichalcones into the binding site of the ALR2 receptor. We show and discuss the results of docking into seven PDB structures, chosen to cover the five different conformations of the active site, including the crystal structure for which the co-crystallized ligands showed the maximal mean 2D similarity with the chalcone series. Also we focused on the mean position error of the atoms in crystal which is reflected by Cruickshank’s diffraction-component precision indexed (DPI). Two docking programs were employed: FRED (with nine scoring functions) and AutoDock Vina. The docking analysis considered the relative scoring function values and the binding affinity prediction. Both indicated confusing results. Some of the scoring functions in FRED and the AutoDock Vina scoring function indicate the ’zopolrestat-conformation’ (with a fully-opened specificity pocket) to be the more likely for the 2’-hydroxichalcones while the rest of the scoring functions indicate diverse conformations to be more probable. The binding affinity analysis was conducted my means of the Kendall correlation coefficient. We obtained low, but statistically significant correlations: the CGO (Chemical Gaussian Overlay) scorings of seven of the ALR2 ligand-binding site conformations positively correlated with the IC50 values. A detailed, systematically study of the binding models obtained with chalcones, in each of the five ALR2 ligand-binding site conformations, was perform albeit considering the binding energies. The majority of the binding poses with most favorable energies were found unrealistic, but among the second best energetically poses we were able to identify and characterize the probable binding models. In a demonstrative study we highlighted the difficulties of scoring functions (and implicitly docking programs) to predict the binding of an ALR2 inhibitor in the binding site of the enzyme, with emphasis on the opening of the specificity pocket. We brought collateral data into attention, i.e., the ability to inhibit aldehyde reductase (ALR1), which might indicate toward a preferable closed-state ALR2 binding site conformation for 2’-hydroxichalcones. Thus, we hypothesis that 5’-chloro,2’hydroxichalcones bind into an ITB-like (2-(carboximethyl)-1,3,3-trioxobenzo[e][1,2]benzotyazol-4-carboxilic acid) conformation is more probable for chalcones. Finally, we draw attention towards the tendency to oversimplify the complexity of docking into ALR2, especially by using only one binding site conformation, as described in numerous studies in the literature. Some of the crystals employed for docking 2’-hydroxychalcones into the ALR2 binding site had no pre-calculated DPI values. Approximation of the Cruickshank DPI was reported by Blow and Goto. We conducted a comparative study, using fifty-five non-mutant, human, ALR2 crystallized proteins, and demonstrated that the Blow formula approximates more accurately the Cruickshank DPI, in contrast to the Goto formulae which tends to overestimate the reference DPI. The chapter ends with a graphical representation of the chemical space showing over six-hundred ALR2 inhibitors with IC50 less than 10 μM (extracted from the ChEMBL database) and PDB co-crystallized ligands. We mapped the 2’-hydroxichalcone series in the resulted space and conclude that this chemical class is vaguely explored compared to other chemical classes. Chapter II.6, II.6 Using ROCS for pharmacophore elucidation and VS – a preliminary study, presents a new approach in using the popular ROCS tri-dimensional similarity search tool. The vROCS interface was employed innovatively to identify pharmacophores and establish a model which was subsequently submitted to evaluation. This preliminary study has been conducted on two CRFCEBS subsets focusing on intra-nuclear receptor JMJD2A and BAZ2B, both capable to regulate gene expression. Starting from the less flexible active compound, we retained the most similar five (based on 3D similarity) from the pool of actives. We isolate superimposed chemical functional groups and constructed pharmacophore models which were evaluated on the entire CFRCEBS subset. The results were compared to 3D models for which the 3D molecular shape was retained. The early enrichment values (measured by means of the eROCE and TP% @ 10% FPR) and the overall discriminative power (AUC) showed relative low performances. However, among the models we could identify also encouraging results, e.g., for BAZ2B: TP% = 31.15% and AUC = 0.67. In the future we will adjust the pharmacophore elucidation algorithm and extend the evaluation to a large number of receptors, comparing the results to specialized pharmacophore elucidation programs. Chapter II.7 describes the computer programs developed along this thesis. ETICI (Evaluation Tool In ChemoInformatics) is an evaluation program which offers the concomitant computation of nine evaluation metrics to characterize the performance of VS methods. We included well established parameters as well as the new eROCE, AROCE and AROCE50. Moreover, the implementation arithmetic weighting scaffold hopping for all parameter is available. SSTICI (Similarity Search Tool In ChemoInformatics) was developed in order to facilitate similarity searched offering 16 similarity coefficients with adjusted formulas for binary and non-binary searches. The third program has no graphical user interface and is denominated FHDPI (File Header Diffraction-component Position Index). It facilitates the computation of the Blow and Goto DPI values using the information available in the REMARK section of the PDB files. The thesis ends with the general conclusions followed by a list of almost two hundred references.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effective Predictor of Human Mast Cell Tryptase Inhibitors

SORIN AVRAM1, STEFANA AVRAM2*, CRISTINA DEHELEAN3 1 Romanian Academy, Institute of Chemistry Timisoara, 24 Mihai Viteazul Blvd., 300223, Timisoara, Romania 2 University of Medicine and Pharmacy Victor Babes Timisoara, Faculty of Pharmacy, Discipline of Pharmacognosy, 2 Eftimie Murgu Sq., 300041, Timisoara, Romania 3 University of Medicine and Pharmacy Victor Babes Timisoara, Faculty of Pharmacy...

متن کامل

New Hybrid Silver Colloid-a3b Porphyrin Complex Exhibiting Wide Band Absorption

I. CREANGA, G. FAGADAR-COSMA, A. PALADE, A. LASCU, C. ENACHE, M. BIRDEANU, E. FAGADAR-COSMA Institute of Chemistry Timisoara of Romanian Academy, 24 M. Viteazu Ave, 300223-Timisoara, Romania "Politehnica" University of Timisoara, Faculty of Industrial Chemistry and Environmental Engineering, Victoriei Square 2, 300006-Timisoara, Romania, National Institute for Research and Development in Electr...

متن کامل

Second Harmonic Generation Diagnostic of Layer by Layer Deposition from Disperse Red 1–Functionalized Maleic Anhydride Copolymer

Layer-by-layer (LBL) electrostatic assembly of poly-electrolytes is proving to be an increasingly rich and versatile technique for the formation of multilayered thin films with a wide range of electrical, magnetic, and optical properties. In the present work we synthesized a new nonlinear optical (NLO) maleic acid copolymer containing Disperse Red 1 moieties, built-up multilayer assemblies by a...

متن کامل

A computational study of lipophilicity of E-2-arylmethylen-1-tetralones and their heteroanalogues using QSAR and DFT Based Molecular surface Electrostatic Potential

E-2-Arylmethylen-1- tetralones and E-3-phenylme thylene chromanone-4-ones and their derivatives closely related to flavonoids belong to the plant secondary metabolites most investigated recently.The class of flavonoids is an enormous class of plant secondary metabolites having so different pharmacological effects as inhibition of nitric oxide synthasecancer preventive effect or potential impact...

متن کامل

Phytochemical characterization, antimicrobial activity and reducing potential of seed oil, latex, machine oil and presscake of Jatropha curcas

Objective: This study aims to evaluate the antimicrobial activity, phytochemical studies and thin layer chromatography analysis of machine oil, hexane extract of seed oil and methanol extract of presscake& latex of Jatropha curcas Linn (family Euphorbiaceae). Materials and Methods: J. curcas extracts were subjected to preliminary qualitative phytochemical screening to detect the major phytochem...

متن کامل

Contradictory Aspects of Bioaccumulation. Icp-ms, an Approachable Method for Elemental Characterization of Crop Medicinal Plants

a Institute of Chemistry Timisoara of the Romanian Academy, 24 Mihai Viteazul Bvd., 30022 – Timișoara, Romania b Faculty of Industrial Chemistry and Environmental Engineering, University Politehnica Timişoara, Piata Victoriei, 300006 Timişoara, Romania c Pharmacy I Department, Faculty of Pharmacy, “Victor Babes ̧” University of Medicine and Pharmacy, 2 Eftimie Murgu Sq., 300041 – Timișoara, Roma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012